![]() "For example you can subtract vectors using standard vector math. "What's important is not each number, but using the numbers to see how words are related to one another," said Jain, who leads a group working on discovery and design of new materials for energy applications using a mix of theory, computation, and data mining. Word2vec took each of the approximately 500,000 distinct words in those abstracts and turned each into a 200-dimensional vector, or an array of 200 numbers. The team collected the 3.3 million abstracts from papers published in more than 1,000 journals between 19. We thought, can machine learning do something to make use of all this collective knowledge in an unsupervised manner-without needing guidance from human researchers?" ![]() "A researcher can access only fraction of that. "In every research field there's 100 years of past research literature, and every week dozens more studies come out," he said. Tshitoyan said the project was motivated by the difficulty making sense of the overwhelming amount of published studies. "The paper establishes that text mining of scientific literature can uncover hidden knowledge, and that pure text-based extraction can establish basic scientific knowledge," said Ceder, who also has an appointment at UC Berkeley's Department of Materials Science and Engineering. Along with Jain, Berkeley Lab scientists Kristin Persson and Gerbrand Ceder helped lead the study. The lead author of the study, "Unsupervised Word Embeddings Capture Latent Knowledge from Materials Science Literature," is Vahe Tshitoyan, a Berkeley Lab postdoctoral fellow now working at Google. The findings were published July 3 in the journal Nature. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far." "That hinted at the potential of the technique. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," said Jain. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |