Computer Learns to Write Its ABCs

By Hugo Angel,

Danqing Wang Computer ABC
Photo-illustration: Danqing Wang
A new computer model can now mimic the human ability to learn new concepts from a single example instead of the hundreds or thousands of examples it takes other machine learning techniques, researchers say.

The new model learned how to write invented symbols from the animated show Futurama as well as dozens of alphabets from across the world. It also showed it could invent symbols of its own in the style of a given language
.The researchers suggest their model could also learn other kinds of concepts, such as speech and gestures.

Although scientists have made great advances in .machine learning in recent years, people remain much better at learning new concepts than machines.

People can learn new concepts extremely quickly, from very little data, often from only one or a few examples. You show even a young child a horse, a school bus, a skateboard, and they can get it from one example,” says study co-author Joshua Tenenbaum at the Massachusetts Institute of Technology. In contrast, “standard algorithms in machine learning require tens, hundreds or even thousands of examples to perform similarly.

To shorten machine learning, researchers sought to develop a model that better mimicked human learning, which makes generalizations from very few examples of a concept. They focused on learning simple visual concepts — handwritten symbols from alphabets around the world.

Our work has two goals: to better understand how people learn — to reverse engineer learning in the human mind — and to build machines that learn in more humanlike ways,” Tenenbaum says.

Whereas standard pattern recognition algorithms represent symbols as collections of pixels or arrangements of features, the new model the researchers developed represented each symbol as a simple computer program. For instance, the letter “A” is represented by a program that generates examples of that letter stroke by stroke when the program is run. No programmer is needed during the learning process — the model generates these programs itself.

Moreover, each program is designed to generate variations of each symbol whenever the programs are run, helping it capture the way instances of such concepts might vary, such as the differences between how two people draw a letter.

The idea for this algorithm came from a surprising finding we had while collecting a data set of handwritten characters from around the world. We found that if you ask a handful of people to draw a novel character, there is remarkable consistency in the way people draw,” says study lead author Brenden Lake at New York University. “When people learn or use or interact with these novel concepts, they do not just see characters as static visual objects. Instead, people see richer structure — something like a causal model, or a sequence of pen strokes — that describe how to efficiently produce new examples of the concept.

The model also applies knowledge from previous concepts to speed learn new concepts. For instance, the model can use knowledge learned from the Latin alphabet to learn the Greek alphabet. They call their model the Bayesian program learning or BPL framework.

The researchers applied their model to more than 1,600 types of handwritten characters in 50 writing systems, including Sanskrit, Tibetan, Gujarati, Glagolitic, and even invented characters such as those from the animated series Futurama and the online game Dark Horizon. In a kind of .Turing test, scientists found that volunteers recruited via .Amazon’s Mechanical Turk had difficulty distinguishing machine-written characters from human-written ones.

The scientists also had their model focus on creative tasks. They asked their system to create whole new concepts — for instance, creating a new Tibetan letter based on what it knew about letters in the Tibetan alphabet. The researchers found human volunteers rated machine-written characters on par with ones developed by humans recruited for the same task.

We got human-level performance on this creative task,” study co-author Ruslan Salakhutdinov at the University of Toronto.

Potential applications for this model could include

  • handwriting recognition,
  • speech recognition,
  • gesture recognition and
  • object recognition.
Ultimately we’re trying to figure out how we can get systems that come closer to displaying human-like intelligence,” Salakhutdinov says. “We’re still very, very far from getting there, though.“The scientists detailed .their findings in the December 11 issue of the journal Science.


By Charles Q. Choi
Posted 10 Dec 2015 | 20:00 GMT

Elon Musk And Sam Altman Launch OpenAI, A Nonprofit That Will Use AI To ‘Benefit Humanity’

By Hugo Angel,

Led by an all-star team of Silicon Valley’s best and brightest, OpenAI already has $1 billion in funding.
Silicon Valley is in the midst of an .artificial intelligence war, as giants like Facebook and Google attempt to outdo each other by deploying machine learning and AI to automate services. But a brand-new organization called .OpenAI—helmed by Elon Musk and a posse of prominent techies—aims to use AI to “benefit humanity,” without worrying about profit.
Musk, the CEO of SpaceX and Tesla, .took to Twitter to announce OpenAI on Friday afternoon.

The organization, the formation of which has been in discussions for quite a while, came together in earnest over the last couple of months, co-chair and Y Combinator CEO Sam Altman told Fast Company. It is launching with $1 billion in funding from the likes of Altman, Musk, LinkedIn founder Reid Hoffman, and Palantir chairman Peter Thiel. In an .introductory blog post, the OpenAI team said “we expect to only spend a tiny fraction of this in the next few years.

Noting that it’s not yet clear on what it will accomplish, OpenAI explains that its nonprofit status should afford it more flexibility. “Since our research is free from financial obligations, we can better focus on a positive human impact,” the blog post reads. “We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely.” We’re just trying to create new knowledge and give it to the world.

The organization features an all-star group of leaders: Musk and Altman are co-chairs, while Google research scientist Ilya Sutskever is research director and Greg Brockman is CTO, a role he formerly held at payments company Stripe.

For nearly everyone involved in OpenAI, the project will be full-time work, Altman explained. For his part, it will be a “major commitment,” while Musk is expected to “come in every week, every other week, something like that.”

Altman explained that everything OpenAI works on—including any intellectual property it creates—will be made public. The one exception, he said, is if it could pose a risk. “Generally speaking,” Altman told Fast Company, “we’ll make all our patents available to the world.

Companies like Facebook and Google are working fast to use AI. Just yesterday, .Facebook announced it is open-sourcing new computing hardware, known as “Big Sur,” that doubles the power and efficiency of computers currently available for AI research. Facebook has also recently talked about using AI to help its blind users, as well as to make broad tasks easier on the giant social networking service. Google, .according to Recode, has also put significant efforts into AI research and development, but has been somewhat less willing to give away the fruits of its labor.

Altman said he imagines that OpenAI will work with both of those companies, as well as any others interested in AI. “One of the nice things about our structure is that because there is no fiduciary duty,” he said, “we can collaborate with anyone.

For now, there are no specific collaborations in the works, Altman added, though he expects that to change quickly now that OpenAI has been announced.

Ultimately, while many companies are working on artificial intelligence as part of for-profit projects, Altman said he thinks OpenAI’s mission—and funding—shouldn’t threaten anyone.I would be very concerned if they didn’t like our mission,” he said. “We’re just trying to create new knowledge and give it to the world.

ORIGINAL: .FastCompany
.Daniel Terdiman.

OpenAI’s research director is Ilya Sutskever, one of the world experts in machine learning. Our CTO is Greg Brockman, formerly the CTO of Stripe. The group’s other founding members are world-class research engineers and scientists: Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, John Schulman, Pamela Vagata, and Wojciech Zaremba. Pieter Abbeel, Yoshua Bengio, Alan Kay, Sergey Levine, and Vishal Sikka are advisors to the group. OpenAI’s co-chairs are Sam Altman and Elon Musk.Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI. In total, these funders have committed $1 billion, although we expect to only spend a tiny fraction of this in the next few years.

You can follow us on Twitter at @open_ai or email us at [email protected].

Scaling up synthetic-biology innovation

By Hugo Angel,

Gen9’s BioFab platform synthesizes small DNA fragments on silicon chips
and uses other technologies to build longer DNA constructs from those
fragments. Done in a parallel, this produces hundreds to thousands of
DNA constructs simultaneously. Shown here is an automated
liquid-handling instrument that dispenses DNA onto the chips. Courtesy of Gen9
MIT professor’s startup makes synthesizing genes many times more cost effective.
Inside and outside of the classroom, MIT professor Joseph Jacobson has become a prominent figure in — and advocate for — the emerging field of synthetic biology.

As head of the Molecular Machines group at the MIT Media Lab, Jacobson’s work has focused on, among other things, developing technologies for the rapid fabrication of DNA molecules. In 2009, he spun out some of his work into .Gen9, which aims to boost synthetic-biology innovation by offering scientists more cost-effective tools and resources.
Headquartered in Cambridge, Massachusetts, Gen9 has developed a method for synthesizing DNA on silicon chips, which significantly cuts costs and accelerates the creation and testing of genes. Commercially available since 2013, the platform is now being used by dozens of scientists and commercial firms worldwide.
Synthetic biologists synthesize genes by combining strands of DNA. These new genes can be inserted into microorganisms such as yeast and bacteria. Using this approach, scientists can tinker with the cells’ metabolic pathways, enabling the microbes to perform new functions, including testing new antibodies, sensing chemicals in an environment, or creating biofuels.

But conventional gene-synthesizing methods can be time-consuming and costly. Chemical-based processes, for instance, cost roughly 20 cents per base pair — DNA’s key building block — and produce one strand of DNA at a time. This adds up in time and money when synthesizing genes comprising 100,000 base pairs.

Gen9’s chip-based DNA, however, drops the price to roughly 2 cents per base pair, Jacobson says. Additionally, hundreds of thousands of base pairs can be tested and compiled in parallel, as opposed to testing and compiling each pair individually through conventional methods.

This means faster testing and development of new pathways — which usually takes many years — for applications such as advanced therapeutics, and more effective enzymes for detergents, food processing, and biofuels, Jacobson says. “If you can build thousands of pathways on a chip in parallel, and can test them all at once, you get to a working metabolic pathway much faster,” he says.

Over the years, Jacobson and Gen9 have earned many awards and honors. In November, Jacobson was also inducted into the National Inventors Hall of Fame for co-inventing E Ink, the electronic ink used for Amazon’s Kindle e-reader display.

Scaling gene synthesizing Throughout the early-and mid-2000s, a few important pieces of research came together to allow for the scaling up of gene synthesis, which ultimately led to Gen9.

First, Jacobson and his students Chris Emig and Brian Chow began developing chips with thousands of “spots,” which each contained about 100 million copies of a different DNA sequence.

Then, Jacobson and another student, David Kong, created a process that used a certain enzyme as a catalyst to assemble those small DNA fragments into larger DNA strands inside microfluidics devices — “which was the first microfluidics assembly of DNA ever,” Jacobson says.

Despite the novelty, however, the process still wasn’t entirely cost effective. On average, it produced a 99 percent yield, meaning that about 1 percent of the base pairs didn’t match when constructing larger strands. That’s not so bad for making genes with 100 base pairs. “But if you want to make something that’s 10,000 or 100,000 bases long, that’s no good anymore,” Jacobson says.

Around 2004, Jacobson and then-postdoc Peter Carr, along with several other students, found a way to drastically increase yields by taking a cue from a natural error-correcting protein, Mut-S, which recognizes mismatches in DNA base pairing that occur when two DNA strands form a double helix. For synthetic DNA, the protein can detect and extract mismatches arising in base pairs synthesized on the chip, improving yields. In a paper published that year in Nucleic Acids Research, the researchers wrote that this process reduces the frequency of errors, from one in every 100 base pairs to around one in every 10,000.

With these innovations, Jacobson launched Gen9 with two co-founders: George Church of Harvard University, who was also working on synthesizing DNA on microchips, and Drew Endy of Stanford University, a world leader in synthetic-biology innovations.

Together with employees, they created a platform called BioFab and several other tools for synthetic biologists. Today, clients use an online portal to order gene sequences. Then Gen9 designs and fabricates those sequences on chips and delivers them to customers. Recently, the startup updated the portal to allow drag-and-drop capabilities and options for editing and storing gene sequences.

This allows users to “make these very extensive libraries that have been inaccessible previously,” Jacobson says.

Fueling big ideas

Many published studies have already used Gen9’s tools, several of which are posted to the startup’s website. Notable ones, Jacobson says, include designing proteins for therapeutics. In those cases, the researcher needs to make 10 million or 100 million versions of a protein, each comprising maybe 50,000 pieces of DNA, to see which ones work best.

Instead of making and testing DNA sequences one at a time with conventional methods, Gen9 lets researchers test hundreds of thousands of sequences at once on a chip. This should increase chances of finding the right protein, more quickly. “If you just have one shot you’re very unlikely to hit the target,” Jacobson says. “If you have thousands or tens of thousands of shots on a goal, you have a much better chance of success.

Currently, all the world’s synthetic-biology methods produce only about 300 million bases per year. About 10 of the chips Gen9 uses to make DNA can hold the same amount of content, Jacobson says. In principle, he says, the platform used to make Gen9’s chips — based on collaboration with manufacturing firm Agilent — could produce enough chips to cover about 200 billion bases. This is about the equivalent capacity of GenBank, an open-access database of DNA bases and gene sequences that has been constantly updated since the 1980s.

Such technology could soon be worth a pretty penny: According to a study published in November by MarketsandMarkets, a major marketing research firm, the market for synthesizing short DNA strands is expected to reach roughly $1.9 billion by 2020.

Still, Gen9 is pushing to drop costs for synthesis to under 1 cent per base pair, Jacobson says. Additionally, for the past few years, the startup has hosted an annual G-Prize Competition, which awards 1 million base pairs of DNA to researchers with creative synthetic-biology ideas. That’s a prize worth roughly $100,000.

The aim, Jacobson says, is to remove cost barriers for synthetic biologists to boost innovation. “People have lots of ideas but are unable to try out those ideas because of cost,” he says. “This encourages people to think about bigger and bigger ideas.”


Rob Matheson | MIT News Office
December 10, 2015

Facebook Joins Stampede of Tech Giants Giving Away Artificial Intelligence Technology

By Hugo Angel,

Open Sourced AIX519
Leading computing companies are helping both themselves and others by open-sourcing AI tools.
Facebook designed this server to put new power behind the simulated
neurons that enable software to do smart things like recognize speech or
the content of photos.
Facebook is releasing for free the designs of a powerful new computer server it crafted to put more power behind artificial-intelligence software. Serkan Piantino, an engineering director in Facebook’s AI Research group, says the new servers are twice as fast as those Facebook used before. “We will discover more things in machine learning and AI as a result,” he says.

The social network’s giveaway is the latest in a recent flurry of announcements by tech giants that are open-sourcing artificial-intelligence technology, which is becoming vital to consumer and business-computing services. Opening up the technology is seen as a way to accelerate progress in the broader field, while also helping tech companies to boost their reputations and make key hires.

In November, Google opened up software called .TensorFlow, used to power the company’s speech recognition and image search (see “.Here’s What Developers Are Doing with Google’s AI Brain”). Just three days later Microsoft released software that distributes machine-learning software across multiple machines to make it more powerful. Not long after, IBM announced the fruition of an earlier promise to open-source SystemML, originally developed to use machine learning to find useful patterns in corporate databanks.

Facebook’s new server design, dubbed Big Sur, was created to power deep-learning software, which processes data using roughly simulated neurons (see “.Teaching Computers to Understand Us”). The invention of ways to put more power behind deep learning, using graphics processors, or GPUs, was crucial to recent leaps in the ability of computers to understand speech, images, and language. Facebook worked closely with Nvidia, a leading manufacturer of GPUs, on its new server designs, which have been stripped down to cram in more of the chips. The hardware can be used to run Google’s TensorFlow software.

Yann LeCun, director of Facebook’s AI Research group, says that one reason to open up the Big Sur designs is that the social network is well placed to slurp up any new ideas it can unlock. “Companies like us actually thrive on fast progress; the faster the progress can be made, the better it is for us,” says LeCun. Facebook open-sourced deep-learning software of its own .in February of this year.

LeCun says that opening up Facebook’s technology also helps attract leading talent. A company can benefit by being seen as benevolent, and also by encouraging people to become familiar with a particular way of working and thinking. As Google, Facebook, and other companies have increased their investments in artificial intelligence, competition to hire experts in the technology has intensified (see “.Is Google Cornering the Market in Deep Learning?”).

Derek Schoettle, general manager of IBM Cloud Data Services unit, which offers tools to help companies analyze data, says that machine-learning technology has to be opened up for it to become widespread. Open-source projects have played a major role in establishing large-scale databases and data analysis as the bedrock of modern computing companies large and small, he says. Real value tends to lie in what companies can do with the tools, not the tools themselves.

What’s going to be interesting and valuable is the data that’s moving in that system and the ways people can find value in that data,” he says. Late last month, IBM transferred its SystemML machine-learning software, designed around techniques other than deep learning, to the Apache Software Foundation, which supports several major open-source projects.

Facebook’s Big Sur server design will be submitted to the Open Compute Project, a group started by the social network through which companies including Apple and Microsoft share designs of computing infrastructure to drive down costs (see “.Inside Facebook’s Not-So-Secret New Data Center”).

ORIGINAL: .Technology Review
By .Tom Simonite 
December 10, 2015

Quantum Computing

By Hugo Angel,

Image credit: .Yuri Samoilov on Flickr
Scientists are exploiting the laws of quantum mechanics to create computers with an exponential increase in computing power.
Quantum computing
Since their creation in the 1950s and 1960s, digital computers have become a mainstay of modern life. Originally taking up entire rooms and taking many hours to perform simple calculations, they have become both highly portable and extremely powerful. Computers can now be found in many people’s pockets, on their desks, in their watches, their televisions and their cars. Our demand for processing power continues to increase as more people connect to the internet and the integration of computing into our lives increases.
Video source: In a nutshell – Kurzgesagt / YouTube. .View video details.
When Moore’s Law meets quantum mechanics
In 1965, Gordon Moore, co-founder of Intel, one of the world’s largest computer companies, first described what has now become known as Moore’s Law. An observation rather than a physical law, Moore noticed that the number of components that could fit on a computer chip doubled roughly every two years, and this observation has proven to hold true over the decades. Accordingly, the processing power and memory capacity of computers has doubled every two years as well.
Starting from computer chips that held a few thousand components in the 1960s, chips today hold several billion components. There is a physical limit to how small these components can get, and as they get near the size of an atom, the quirky rules that govern quantum mechanics come into play. These rules that govern the quantum world are so different from those of the macro world that our traditional understanding of binary logic in a computer doesn’t really work effectively any more. Quantum laws are based on probabilities, so a computer on this scale no longer works in a ‘deterministic’ manner, which means it gives us a definite answer. Rather, it starts to behave in a ‘probablistic’ way—the answer the computer would give us is based on probabilities, each result could fluctuate and we would have to try several times to get a reliable answer.
So if we want to keep increasing computer power, we are going to have to find a new way. Instead of being stymied or trying to avoid the peculiarities of quantum mechanics, we must find ways to exploit them.
Source: TEDx Talks on YouTube. View .video details and transcript.
Bits and qubits
In the computer that sits on your desk, your smartphone, or the biggest supercomputer in the world, information, be it text, pictures or sound is stored very simply as a number. The computer does its job by performing arithmetic calculations upon all these numbers. For example, every pixel in a photo is assigned numbers that represents its colour or brightness, numbers that can then be used in calculations to change or alter the image.
The computer saves these numbers in binary form instead of the decimal form that we use every day. In binary, there are only two numberss: 0 and 1. In a computer, these are known as ‘bits’, short for ‘binary digits’. Every piece of information in your computer is stored as a string of these 0s and 1s. As there are only two options, the 1 or the 0, it’s easy to store these using a number of different methods—for example, as magnetic dots on a hard drive, where the bit is either magnetised one way (1) or another (0), or where the bit has a tiny amount of electrical charge (1) or no charge (0). These combinations of 0s and 1s can represent almost anything, including letters, sounds and commands that tell the computer what to do.
Instead of binary bits, a quantum computer uses qubits. These are particles, such as an atom, ion or photon, where the information is stored by manipulating the particles’ quantum properties, such as spin or polarisation states.
In a normal computer the many steps of a calculation are carried out one after the other. Even if the computer might work on several calculations in parallel, each calculation has to be done one step at a time. A quantum computer works differently. The qubits are programmed with a complex set of conditions, which formulates the question, and these conditions then evolve following the rules of the quantum worldSchrödinger’s wave equation—to find the answer. Each programmed qubit evolves simultaneously; all the steps of the calculation are taken at the same time. Mathematicians have found that this approach can solve a number of computational tasks that are very hard or time consuming on a classical computer. The speed advantage is enormous—and grows with the complexity we can program (i.e. the number of qubits the quantum computer has).
Individually, each qubit has its own quantum properties, such as spin. This has two values +1 and -1, but can also be in what’s called a superposition: partly +1 and partly -1. If you think of a globe, you can point to the North Pole (+1) or the South Pole (-1) or any other point in between: London, or Sydney. A quantum particle can be a in a state that is part North Pole and part South Pole.
A qubit with superposition is in a much more complex state than the simple 1 or 0 of a binary bit. More parameters are required to describe that state, and this translates to the amount of information a qubit can hold and process.
Even more interesting is the fact that we can link many particles, each in their state of superposition, together. We can create a link, called entanglement, where all of these particles are dependent upon each other, all their properties exist at the same time. All the particles together are in one big state that evolves, according to the rules of quantum mechanics, as a single system. This is what gives quantum computers their power of parallel processing—the qubits all evolve, individual yet linked, simultaneously.
Imagine the complexity of all these combinations, all the superpositions. The number of parameters needed to fully describe N qubits grows as 2 to the power N. Basically, this means that for each qubit you add to the computer, the information required to describe the assembly of qubits doubles. Just 50 qubits would require more than a billion numbers to describe their collective states or contents. This is where the supreme power of a quantum computer lies, since the evolution in time of these qubits corresponds to a bigger calculation, without costing more time.
For the particular tasks suited to quantum computers, a quantum computer with 30 qubits would be more powerful than the world’s most powerful supercomputer, and a 300 qubit quantum computer would be more powerful than every computer in the world connected together.
A delicate operation
An important feature of these quantum rules is that they are very sensitive to outside interference. The qubits must be kept completely isolated, so they are only being controlled by the laws of quantum mechanics, and not influenced by any environmental factors. Any disturbance to the qubits will cause them to leave their state of superposition—this is called decoherence. If the qubits decohere, the computation will break down. Creating a totally quiet, isolated environment is one of the great challenges of building a quantum computer.
Another challenge is transferring information from the quantum processor to some sort of quantum memory system that can preserve the information so that we can then read the answer. Researchers are working on developing ‘non-demolition’ readouts—ways to read the output of a computation without breaking the computation.
What are quantum computers useful for?
A lot of coverage of the applications of quantum computers talk about the huge gains in processing power over classical computers. Many statements have been made about being able to effortlessly solve hard problems instantaneously but it’s not clear if all the promises will hold up. Rather than being able to solve all of the world’s financial, medical and scientific questions at the press of a button, it’s much more likely that, as with many major scientific projects, the knowledge gain that comes from building the computers will prove just as valuable as their potential applications.
The nearest term and most likely applications for quantum computers will be within quantum mechanics itself. Quantum computers will provide a useful new way of simulating and testing the workings of quantum theory, with implications for chemistry, biochemistry, nanotechnology and drug design. Search engine optimisation for internet searches, management of other types of big data and optimising other systems, such as fleet routing and manufacturing processes could also be impacted by quantum computing.
Another area where large scale quantum computers are predicted to have a big impact is that of data security. In a world where so much of our personal information is online, keeping our data—bank details or our medical records—secure is crucial. To keep it safe, our data is protected by encryption algorithms that the recipient needs to ‘unlock’ with a key. Prime number factoring is one method used to create encryption algorithms. The key is based on knowing the prime number factors of a large number. This sounds pretty basic, but it’s actually very difficult to figure out what the prime number factors of a large number are.
Classical computers can very easily multiply two prime numbers to find their product. But their only option when performing the operation in reverse is a repetitive process of checking one number after another. Even performing billions of calculations per second, this can take an extremely long time when the numbers get especially large. Once numbers reach over 1000 digits, figuring out its prime number factors is generally considered to take too long for a classical computer to calculate—the data encryption is ‘uncrackable’ and our data is kept safe and sound.
However, the superposed qubits of quantum computers change everything. In 1994, mathematician Peter Shor came up with an algorithm that would enable quantum computers to factor large prime numbers significantly faster than by classical methods. As quantum computing advances we may need to change the way we secure our data so that quantum computers can’t access it.
Beyond these applications that we can foretell, there will undoubtedly be many new applications appearing as the technology develops. With classical computers, it was impossible to predict the advances of the internet, voice recognition and touch interfaces that are today so commonplace. Similarly, the most important breakthroughs to come from quantum computing are likely still unknown.
Quantum computer research in Australia
There are several centres of quantum computing research in Australia, working all over the country on a wide range of different problems. The Australian Research Council Centre of Excellence for Quantum Computation and Communication Technology (CQC2T) and the Centre of Excellence for Engineered Quantum Systems (EQuS) are both at the forefront of research in this field.
Two teams from the University of NSW (UNSW) are using silicon to create extremely coherent qubits in new ways, which opens the door to creating quantum computers using easy to manufacture components. One team is focussing on using silicon transistors like those in our laptops and smartphones.
The other UNSW-led team is working to create qubits from phosphorus atoms embedded in silicon. In 2012, this team created atom-sized components ten years ahead of schedule by making the world’s smallest transistor.
Source: UNSWTV on YouTube. View .video details and transcript.
Source: UNSWTV on YouTube. View .video details and transcript.
 They placed a single phosphorus atom on a sheet of silicon with all the necessary atomic-sized components that would be needed to apply a voltage to the phosphorus atom, giving it its spin state in order to function as a qubit.

The nuclear spins of single phosphorus atoms have been shown to have 

  • the highest fidelity (>99%) and 
  • longest coherence time (>35 seconds

of any qubit in the solid state making them extremely attractive for a scalable system.

 A team at the University of Queensland is working to develop quantum computing techniques using single photons as qubits. In 2010, this team has also conducted the first quantum chemistry simulation. This sort of a task involves computing the complex quantum interactions between electrons and requires such complicated equations that performing the calculations with a classical computer necessarily requires a trade-off between accuracy and computational feasibility. Qubits, being in a quantum state themselves, are much more capable of representing these systems, and so offer great potential to the field of quantum chemistry.
This group has also performed a demonstration of a quantum device using photons capable of performing a task that is factorially difficult – i.e. one of the specific tasks that classical computers get stuck with.
Large sums of money are being invested into Australian quantum computing research. In 2014, the Commonwealth Bank made an investment of $5 million towards the .Centre of Excellence for Quantum Computation and Communication Technology at the University of New South Wales. Microsoft has invested more than $10M in engineered quantum systems at University of Sydney, also in 2014.
It’s not very likely that in 20 years we’ll all be walking around with quantum devices in our pockets. Most likely, the first quantum computers will be servers that people will access to undertake complex calculations. However, it is not easy to predict the future, who would have thought fifty years ago, that we would enjoy the power and functionality of today’s computers, like the smartphones that so many of us now depend upon? Who can tell what technology will be at our beck and call if the power of quantum mechanics can be harvested?

Here’s What Developers Are Doing with Google’s AI Brain

By Hugo Angel,

Google Tensor Flow. Jeff Dean
Researchers outside Google are testing the software that the company uses to add artificial intelligence to many of its products.
Tech companies are racing to set the standard for machine learning, and to attract technical talent.
Jeff Dean speaks at a Google event in 2007. Credit: Photo by Niall Kennedy / CC BY-NC 2.0
An artificial intelligence engine that Google uses in many of its products, and that it made freely available last month, is now being used by others to perform some neat tricks, including 
  • translating English into Chinese, 
  • reading handwritten text, and 
  • even generating original artwork.
The AI software, called Tensor Flow, provides a straightforward way for users to train computers to perform tasks by feeding them large amounts of data. The software incorporates various methods for efficiently building and training simulated “deep learning” neural networks across different computer hardware.
Deep learning is an extremely effective technique for training computers to recognize patterns in images or audio, enabling machines to perform with human-like competence useful tasks such as recognizing faces or objects in images. Recently, deep learning also has shown significant promise for parsing natural language, by enabling machines to respond to spoken or written queries in meaningful ways.
Speaking at the Neural Information Processing Society (NIPS) conference in Montreal this week, Jeff Dean, the computer scientist at Google who leads the Tensor Flow effort, said that the software is being used for a growing number of experimental projects outside the company.
These include software that generates captions for images and code that translates the documentation for Tensor Flow into Chinese. Another project uses Tensor Flow to generate artificial artwork. “It’s still pretty early,” Dean said after the talk. “People are trying to understand what it’s best at.
Tensor Flow grew out of a project at Google, called Google Brain, aimed at applying various kinds of neural network machine learning to products and services across the company. The reach of Google Brain has grown dramatically in recent years. Dean said that the number of projects at Google that involve Google Brain has grown from a handful in early 2014 to more than 600 today.
Most recently, the Google Brain helped develop Smart Reply, a system that automatically recommends a quick response to messages in Gmail after it scans the text of an incoming message. The neural network technique used to develop Smart Reply was presented by Google researchers at the NIPS conference last year.
Dean expects deep learning and machine learning to have a similar impact on many other companies. “There is a vast array of ways in which machine learning is influencing lots of different products and industries,” he said. For example, the technique is being tested in many industries that try to make predictions from large amounts of data, ranging from retail to insurance.
Google was able to give away the code for Tensor Flow because the data it owns is a far more valuable asset for building a powerful AI engine. The company hopes that the open-source code will help it establish itself as a leader in machine learning and foster relationships with collaborators and future employees. Tensor Flow “gives us a common language to speak, in some sense,” Dean said. “We get benefits from having people we hire who have been using Tensor Flow. It’s not like it’s completely altruistic.
A neural network consists of layers of virtual neurons that fire in a cascade in response to input. A network “learns” as the sensitivity of these neurons is tuned to match particular input and output, and having many layers makes it possible to recognize more abstract features, such as a face in a photograph.
Tensor Flow is now one of several open-source deep learning software libraries, and its performance currently lags behind some other libraries for certain tasks. However, it is designed to be easy to use, and it can easily be ported between different hardware. And Dean says his team is hard at work trying to improve its performance.
In the race to dominate machine learning and attract the best talent, however, other companies may release competing AI engines of their own.
December 8, 2015

Google says its quantum computer is more than 100 million times faster than a regular computer chip

By Hugo Angel,

NASA Quantum Vesuvius Close Up
Above: The D-Wave 2X quantum computer at NASA Ames Research Lab in Mountain View, California, on December 8.
Image Credit: Jordan Novet/VentureBeat
Google appears to be more confident about the technical capabilities of its D-Wave 2X quantum computer, which it operates alongside NASA at the U.S. space agency’s Ames Research Center in Mountain View, California.
D-Wave’s machines are the closest thing we have today to quantum computing, which works with quantum bits, or qubits — each of which can be zero or one or both — instead of more conventional bits. The superposition of these qubits enable machines to make great numbers of computations to simultaneously, making a quantum computer highly desirable for certain types of processes.
In two tests, the Google NASA Quantum Artificial Intelligence Lab today announced that it has found the D-Wave machine to be considerably faster than simulated annealing — a simulation of quantum computation on a classical computer chip.
Google director of engineering Hartmut Neven went over the results of the tests in a blog post today:
We found that for problem instances involving nearly 1,000 binary variables, quantum annealing significantly outperforms its classical counterpart, simulated annealing. It is more than 108 times faster than simulated annealing running on a single core. We also compared the quantum hardware to another algorithm called Quantum Monte Carlo. This is a method designed to emulate the behavior of quantum systems, but it runs on conventional processors. While the scaling with size between these two methods is comparable, they are again separated by a large factor sometimes as high as 108.
Google has also published a paper on the findings.
If nothing else, this is a positive signal for venture-backed D-Wave, which has also sold quantum computers to Lockheed Martin and Los Alamos National Laboratory. At an event at NASA Ames today where reporters looked at the D-Wave machine, chief executive Vern Brownell sounded awfully pleased at the discovery. Without question, the number 100,000,000 is impressive. It’s certainly the kind of thing the startup can show when it attempts to woo IT buyers and show why its technology might well succeed in disrupting legacy chipmakers such as Intel.
But Google continues to work with NASA on quantum computing, and meanwhile Google also has its own quantum computing hardware lab. And in that initiative, Google is still in the early days.
I would say building a quantum computer is really, really hard, so first of all, we’re just trying to get it to work and not worry about cost or size or whatever,” said John Martinis, the person leading up Google’s hardware program and a professor of physics at the University of California, Santa Barbara.
Commercial applications of this technology might not happen overnight, but it’s possible that eventually they could lead to speed-ups for things like image recognition, which is in place inside of many Google services. But the tool could also come in handy for a traditional thing like cleaning up dirty data. Outside of Google, quantum speed-ups could translate into improvements for planning and scheduling and air traffic management, said David Bell, director of the Universities Space Research Association’s Research Institute for Advanced Computer Science, which also works on the D-Wave machine at NASA Ames.
ORIGINAL: Venture Beat
DECEMBER 8, 2015

IBM’s SystemML machine learning system becomes Apache Incubator project

By Hugo Angel,

There’s a race between tech giants to open source machine learning systems and become a dominant platform. Apache SystemML has clear enterprise spin.
IBM on Monday said its machine learning system, dubbed SystemML, has been accepted as an open source project by the Apache Incubator.
Machine learning, task automation and robotics are already widely used in business. These and other AI technologies are about to multiply, and we look at how organizations can best take advantage of them.
The Apache Incubator is an entry to becoming a project of The Apache Software Foundation. The general idea behind the incubator is to ensure code donations adhere to Apache’s legal guidelines and communities follow guiding principles.
IBM said it would donate SystemML as an open source project in June.
What’s notable about IBM’s SystemML milestone is that open sourcing machine learning systems is becoming a trend. To wit:
For enterprises, the upshot is that there will be a bevy of open source machine learning code bases to consider. Google TensorFlow and Facebook Torch are tools to train neural networks. SystemML is aimed a broadening the ecosystem to business use.
Why are tech giants going open source with their machine learning tools?
The machine learning platform that gets the most data will learn faster and then become more powerful. That cycle will just result in more data to ingest. IBM is looking to work the enterprise angle on machine learning. Microsoft may be another entry on the enterprise side, but may not go the Apache route.
In addition, there are precedents to how open sourcing big analytics ideas can pay off. MapReduce and Hadoop started as open source projects and would be a cousin of whatever Apache machine learning system wins out.
IBM’s SystemML, which is now Apache SystemML, is used to create industry specific machine learning algorithms for enterprise data analysis. IBM created SystemML so it could write one codebase that could apply to multiple industries and platforms. If SystemML can scale, IBM’s Apache move could provide a gateway to its other analytics wares.
The Apache SystemML project has included more than 320 patches for everything from APIs, data ingestion and documentation, more than 90 contributions to Apache Spark and 15 additional organizations adding to the SystemML engine.
Here’s the full definition of the Apache SystemML project:
SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark. ML algorithms are expressed in a R or Python syntax, that includes linear algebra primitives, statistical functions, and ML-specific constructs. This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics, and (2) data independence from the underlying input formats and physical data representations. Automatic optimization according to data characteristics such as distribution on the disk file system, and sparsity as well as processing characteristics in the distributed environment like number of nodes, CPU, memory per node, ensures both efficiency and scalability.
November 23, 2015

Allen Institute researchers decode patterns that make our brains human

By Hugo Angel,

Each of our human brains is special, carrying distinctive memories and giving rise to our unique thoughts and actions. Most research on the brain focuses on what makes one brain different from another. But recently, Allen Institute researchers turned the question around.
Add caption
So much research focuses on the variations between individuals, but we turned that question on its head to ask, what makes us similar?” says Ed Lein, Ph.D., Investigator at the Allen Institute for Brain Science. “What is the conserved element among all of us that must give rise to our unique cognitive abilities and human traits?
Their work, published this month in Nature Neuroscience, looked at gene expression across the entire human brain and identified a surprisingly small set of molecular patterns that dominate gene expression in the human brain and appear to be common to all individuals.
Looking at the data from this unique vantage point enables us to study gene patterning that we all share,” says Mike Hawrylycz, Ph.D., Investigator at the Allen Institute for Brain Science. “We used the Allen Human Brain Atlas data to quantify how consistent the patterns of expression for various genes are across human brains, and to determine the importance of the most consistent and reproducible genes for brain function.
Despite the anatomical complexity of the brain and the complexity of the human genome, most of the patterns of gene usage across all 20,000 genes could be characterized by just 32 expression patterns. The most highly stable genes—the genes that were most consistent across all brains—include those that are associated with diseases and disorders like autism and Alzheimer’s and include many existing drug targets. These patterns provide insights into what makes the human brain distinct and raise new opportunities to target therapeutics for treating disease.
Allen Institute researchers decode patterns that make our brains human
Conserved gene patterning across human brains provide insights into health and disease
The human brain may be the most complex piece of organized matter in the known universe, but Allen Institute researchers have begun to unravel the genetic code underlying its function. Research published this month in Nature Neuroscience identified a surprisingly small set of molecular patterns that dominate gene expression in the human brain and appear to be common to all individuals, providing key insights into the core of the genetic code that makes our brains distinctly human.
“So much research focuses on the variations between individuals, but we turned that question on its head to ask, what makes us similar?” says Ed Lein, Ph.D., Investigator at the Allen Institute for Brain Science. “What is the conserved element among all of us that must give rise to our unique cognitive abilities and human traits?”
Researchers used data from the publicly available Allen Human Brain Atlas to investigate how gene expression varies across hundreds of functionally distinct brain regions in six human brains. They began by ranking genes by the consistency of their expression patterns across individuals, and then analyzed the relationship of these genes to one another and to brain function and association with disease.
Looking at the data from this unique vantage point enables us to study gene patterning that we all share,” says Mike Hawrylycz, Ph.D., Investigator at the Allen Institute for Brain Science. “We used the Allen Human Brain Atlas data to quantify how consistent the patterns of expression for various genes are across human brains, and to determine the importance of the most consistent and reproducible genes for brain function.
Despite the anatomical complexity of the brain and the complexity of the human genome, most of the patterns of gene usage across all 20,000 genes could be characterized by just 32 expression patterns. While many of these patterns were similar in human and mouse, the dominant genetic model organism for biomedical research, many genes showed different patterns in human. Surprisingly, genes associated with neurons were most conserved across species, while those for the supporting glial cells showed larger differences.
The most highly stable genes—the genes that were most consistent across all brains—include those that are associated with diseases and disorders like autism and Alzheimer’s and include many existing drug targets. These patterns provide insights into what makes the human brain distinct and raise new opportunities to target therapeutics for treating disease.
The researchers also found that the pattern of gene expression in cerebral cortex is correlated with “functional connectivity” as revealed by neuroimaging data from the Human Connectome Project. “It is exciting to find a correlation between brain circuitry and gene expression by combining high quality data from these two large-scale projects,” says David Van Essen, Ph.D., professor at Washington University in St. Louis and a leader of the Human Connectome Project.
The human brain is phenomenally complex, so it is quite surprising that a small number of patterns can explain most of the gene variability across the brain,” says Christof Koch, Ph.D., President and Chief Scientific Officer at the Allen Institute for Brain Science. “There could easily have been thousands of patterns, or none at all. This gives us an exciting way to look further at the functional activity that underlies the uniquely human brain.
This research was conducted in collaboration with the Cincinnati Children’s Hospital and Medical Center and Washington University in St. Louis.
The project described was supported by award numbers 1R21DA027644 and 5R33DA027644 from the National Institute on Drug Abuse. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health and the National Institute on Drug Abuse.

About the Allen Institute for Brain Science
The Allen Institute for Brain Science is an independent, 501(c)(3) nonprofit medical research organization dedicated to accelerating the understanding of how the human brain works in health and disease. Using a big science approach, the Allen Institute generates useful public resources used by researchers and organizations around the globe, drives technological and analytical advances, and discovers fundamental brain properties through integration of experiments, modeling and theory. Launched in 2003 with a seed contribution from founder and philanthropist Paul G. Allen, the Allen Institute is supported by a diversity of government, foundation and private funds to enable its projects. Given the Institute’s achievements, Mr. Allen committed an additional $300 million in 2012 for the first four years of a ten-year plan to further propel and expand the Institute’s scientific programs, bringing his total commitment to date to $500 million. The Allen Institute’s data and tools are publicly available online at
ORIGINAL: Allen Institute
November 16, 2015

How swarm intelligence could save us from the dangers of AI

By Hugo Angel,

Image Credit: diez artwork/Shutterstock
We’ve heard a lot of talk recently about the dangers of artificial intelligence. From Stephen Hawking and Bill Gates, to Elon Musk, and Steve Wozniak, luminaries around the globe have been sounding the alarm, warning that we could lose control over this powerful technology — after all, AI is about creating systems that have minds of their own. A true AI could one day adopt goals and aspirations that harm us.
But what if we could enjoy the benefits of AI while ensuring that human values and sensibilities remain an integral part of the system?
This is where something called Artificial Swarm Intelligence comes in – a method for building intelligent systems that keeps humans in the loop, merging the power of computational algorithms with the wisdom, creativity, and intuition of real people. A number of companies around the world are already exploring swarms.

  • There’s Enswarm, a UK startup that is using swarm technologies to assist with recruitment and employment decisions
  • There’s, a startup using swarming and crypto-currencies like Bitcoin as a new model for fundraising
  • And the human swarming company I founded, Unanimous A.I., creates a unified intellect from any group of networked users.
This swarm intelligence technology may sound like science fiction, but it has its roots in nature.
It all goes back to the birds and the bees – fish and ants too. Across countless species, social groups have developed methods of amplifying their intelligence by working together in closed-loop systems. Known commonly as flocks, schools, colonies, and swarms, these natural systems enable groups to combine their insights and thereby outperform individual members when solving problems and making decisions. Scientists call this “Swarm Intelligence” and it supports the old adage that many minds are better than one.
But what about us humans?
Clearly, we lack the natural ability to form closed-loop swarms, but like many other skills we can’t do naturally, emerging technologies are filling a void. Leveraging our vast networking infrastructure, new software techniques are allowing online groups to form artificial swarms that can work in synchrony to answer questions, reach decisions, and make predictions, all while exhibiting the same types of intelligence amplifications as seen in nature. The approach is sometimes called “blended intelligence” because it combines the hardware and software technologies used by AI systems with populations of real people, creating human-machine systems that have the potential of outsmarting both humans and pure-software AIs alike.
It should be noted that swarming” is different from traditional “crowdsourcing,” which generally uses votes, polls, or surveys to aggregate opinions. While such methods are valuable for characterizing populations, they don’t employ the real-time feedback loops used by artificial swarms to enable a unique intelligent system to emerge. It’s the difference between measuring what the average member of a group thinks versus allowing that group to think together and draw conclusions based upon their combined knowledge and intuition.
Outside of the companies I mentioned above, where else can such collective technologies be applied? One area that’s currently being explored is medical diagnosis, a process that requires deep factual knowledge along with the experiential wisdom of the practitioner. Can we merge the knowledge and wisdom of many doctors into a single emergent diagnosis that outperforms the diagnosis of a single practitioner? The answer appears to be yes. In a recent study conducted by Humboldt-University of Berlin and RAND Corporation, a computational collective of radiologists outperformed single practitioners when viewing mammograms, reducing false positives and false negatives. In a separate study conducted by John Carroll University and the Cleveland Clinic, a collective of 12 radiologists diagnosed skeletal abnormalities. As a computational collective, the radiologists produced a significantly higher rate of correct diagnosis than any single practitioner in the group. Of course, the potential of artificially merging many minds into a single unified intelligence extends beyond medical diagnosis to any field where we aim to exceed natural human abilities when making decisions, generating predictions, and solving problems.
Now, back to the original question of why Artificial Swarm Intelligence is a safer form of AI.
Although heavily reliant on hardware and software, swarming keeps human sensibilities and moralities as an integral part of the processes. As a result, this “human-in-the-loop” approach to AI combines the benefits of computational infrastructure and software efficiencies with the unique values that each person brings to the table:

  • creativity, 
  • empathy, 
  • morality, and 
  • justice. 

And because swarm-based intelligence is rooted in human input, the resulting intelligence is far more likely to be aligned with humanity – not just with our values and morals, but also with our goals and objectives.

How smart can an Artificial Swarm Intelligence get?
That’s still an open question, but with the potential to engage millions, even billions of people around the globe, each brimming with unique ideas and insights, swarm intelligence may be society’s best hope for staying one step ahead of the pure machine intelligences that emerge from busy AI labs around the world.
Louis Rosenberg is CEO of swarm intelligence company Unanimous A.I. He did his doctoral work at Stanford University in robotics, virtual reality, and human-computer interaction. He previously developed the first immersive augmented reality system as a researcher for the U.S. Air Force in the early 1990s and founded the VR company Immersion Corp and the 3D digitizer company Microscribe.
ORIGINAL: VentureBeat
NOVEMBER 22, 2015

A Visual History of Human Knowledge | Manuel Lima | TED Talks

By Hugo Angel,

How does knowledge grow? 

Source: EPFL Blue Brain Project. Blue Brain Circuit

Sometimes it begins with one insight and grows into many branches. Infographics expert Manuel Lima explores the thousand-year history of mapping data — from languages to dynasties — using trees of information. It’s a fascinating history of visualizations, and a look into humanity’s urge to map what we know.


Sep 10, 2015

PLOS and DBpedia – an experiment towards Linked Data

By Hugo Angel,

Editor’s Note: This article is coauthored by Bob Kasenchak, Director of Business Development/Taxonomist at Access Innovations.
PLOS publishes articles covering a huge range of disciplines. This was a key factor in PLOS deciding to develop its own thesaurus – currently with 10,767 Subject Area terms for classifying the content.
We wondered whether matching software could establish relationships between PLOS Subject Areas and corresponding terms in external datasets. These relationships could enable links between data resources and expose PLOS content to a wider audience. So we set out to see if we could populate a field for each term in the PLOS thesaurus with a link to an external resource that describes—or, is “the same as”—the concept in the thesaurus. If so, we could:
• Provide links between PLOS Subject Area pages and external resources
• Import definitions to the PLOS thesaurus from matching external resources
For example, adding Linked Data URIs to the Subject Areas would facilitate making the PLOS thesaurus available as part of the Semantic Web of linked vocabularies.
We decided to use DBpedia for this trial for two reasons:
Firstly, as stated by DBpediaThe DBpedia knowledge base is served as Linked Data on the Web. As DBpedia defines Linked Data URIs for millions of concepts, various data providers have started to set RDF links from their data sets to DBpedia, making DBpedia one of the central interlinking-hubs of the emerging Web of Data.
Figure 1: Linked Open Data Cloud with PLOS shown linking to DBpedia – the concept behind this project.


Secondly, DBpedia is constantly (albeit slowly) updated based on frequently-used Wikipedia pages; so has a method to stay current, and a way to add content to DBpedia pages, providing inbound links—so people can link (directly or indirectly) to PLOS Subject Area Landing Pages via DBpedia.
Figure 2: ‘Cognitive psychology’ pages in PLOS and DBpedia 


Which matching software to trial?
We considered two possibilities: Silk and Spotlight
  • The Silk method might have allowed more granular, specific, and accurate queries, but it would have required us to learn a new query language. 
  • Spotlight, on the other hand, is executable by a programmer via API and required little effort to run, re-run, and check results; it took only a matter of minutes to get results from a list of terms to match. 

So we decided to use Spotlight for this trial.

Which sector of the thesaurus to target?
We chose the Psychology division of 119 terms (see Appendix) as a good starting point because it provides a reasonable number of test terms so that trends could emerge, and a range of technical terms (e.g. Neuropsychology) as well as general-language terms (e.g. Attention) to test the matching software.
Figure 3: Work flow.


Step 1: We created the External Link and Synopsis DBpedia fields in the MAIstro Thesaurus Master application to store the identified external URIs and definitions. The External Link field accommodates the corresponding external URI, and the Synopsis DBpedia field houses the definition – “dbo:abstract” in DBpedia.
Step 2: Matching DBpedia concepts with PLOS Subject Areas using Spotlight:
  • Phase 1: For the starting set of terms we chose Psychology (a Tier 2 term) and the 21 Narrower Terms that sit in Tier 3 immediately beneath Psychology (listed in Appendix ).
  • Phase 2: For Phase 2 we included the remaining 98 terms from Tier 4 and deeper beneath Psychology (listed in Appendix ).
Step 3: Importing External Link/Synopsis DBpedia to PLOS Thesaurus: Once a list of approved matching PLOS-term-to-DBpedia-page correspondences was established, another quick DBpedia Spotlight query provided the corresponding Definitions. Access Innovations populated the fields by loading the links and definitions to the corresponding term records. For the “Cognitive psychology” example these are:
Synopsis DBpedia: Cognitive psychology is the study of mental processes such as “attention, language use, memory, perception, problem solving, creativity, and thinking.” Much of the work derived from cognitive psychology has been integrated into various other modern disciplines of psychological study including

  • educational psychology, 
  • social psychology, 
  • personality psychology, 
  • abnormal psychology, 
  • developmental psychology, and 
  • economics.
How did it go?
The table shows the distribution of results for the 119 Subject Areas in the Psychology branch of the PLOS thesaurus:
Add caption

Thus a total of 96 matches could be found by any method (80.7% of terms – top three rows of the Table). Of these, 86 terms (72.3% of terms) were matched as one of the top 5 Spotlight hits (top two rows of the Table), as compared to 71 matches (59.7% of terms) being identified correctly and directly by Spotlight as the top hit (top row of the Table).

Figure 4 shows the two added fields “Synopsis DBpedia” and “External Link” in MAIstro, for “Cognitive Psychology”.
Figure 4: Addition of Synopsis DBpedia and External Link fields to MAIstro.


We had set out to establish whether matching software could define relationships between PLOS thesaurus terms and corresponding terms in external datasets. We used the Psychology division of the PLOS thesaurus as our test vocabulary, Spotlight as our matching software, and DBpedia as our target external dataset.
We found that unambiguous suitable matches were identified for 59.7% of terms. Expressed another way, mismatches were identified as the top hit for 35 cases (29.4% of terms) which is a high burden of inaccuracy. This is too low a quality outcome for us to consider adopting Spotlight suggestions without editorial review.
As well as those terms that were matched as a top hit, a further 12.6% of terms (or 31% of the terms not successfully matched as a top hit) had a good match in Spotlight hit positions 2-5. So Spotlight successfully matched 72.3% of terms within in the top 5 Spotlight matches.
Having the Spotlight hit list for each term did bring efficiency to finding the correspondences. Both the “hits” and the “misses” were straightforward to identify. As an aid to the manual establishment of these links Spotlight is extremely useful.
Stability of DBpedia: We noticed that the dbo:abstract content in DBpedia is not stable. It would be an enhancement to append the Synopsis DBpedia field contents with URI and date stamp as a rudimentary versioning/quality control measure.
Can we improve on Spotlight? Possibly. We wouldn’t be comfortable with any scheme that linked PLOS concepts to the world of Linked Data sources without editorial quality control. But we suspect that a more sophisticated matching tool might promote those hits that fell within Spotlight matches 2-5 to the top hit, and would find some of the 8.4% of terms which were found manually but which Spotlight did not suggest in the top 5 hits at all. We hope to invest some effort in evaluating Silk, and establishing whether or not any other contenders are emerging.
Introducing PLOS Subject Area URIs into DBpedia page: This was explored and it seemed likely that the route to achieve this would be to add the PLOS URI first to the corresponding Wikipedia page, in the “External Links” section.
Figure 5: The External Links section of Wikipedia: Cognitive psychology


As DBpedia (slowly) crawls through all associated Wikipedia pages, eventually the new PLOS link would be added to the DBpedia entry for each page.
To demonstrate this methodology, we added a backlink to the corresponding PLOS Subject Area page in the Wikipedia article shown above (Cognitive psychology) as well as all 21 Tier 3 Psychology terms.
Figure 6: External Links at Wikipedia: Cognitive psychology showing link back to the corresponding PLOS Subject Area page


Were DBpedia to re-crawl this page, the link to the PLOS page would be added to DBpedia’s corresponding page as well.
However, Wikipedia questioned the value of the PLOS backlinks (“link spam”) and their appropriateness to the “External Links” field in the various Wikipedia pages. A Wiki administrator can deem them inappropriate and remove them from Wikipedia (as has happened for some if not all of them by the time you read this).
We believe the solution is to publish the PLOS thesaurus as Linked Open Data (in either SKOS or OWL format(s)) and assert the link to the published vocabulary from DBpedia (using the field owl:sameAs instead of dbo:wikiPageExternalLink). We are looking into the feasibility and mechanics of this.
Once the PLOS thesaurus is published in this way, the most likely candidate for interlinking data would be to use the SILK Linked Data Integration Framework and we look forward to exploring that possibility.
Appendix: The Psychology division of the PLOS thesaurus. LD_POC_blog.appendix
 November 10, 2015